An important step in a document recognition process is analysis of a document image to extract information about the document that is to be recognized. The analysis of the document image can identify the parts of the document with text, pictures, and tables, the language of the document, orientation of the document, logical structure of the document, etc.
Information about whether the document contains oriental writing (understood primarily to mean Chinese, Japanese or Korean characters) (hereinafter “CJK characters”) is an important piece of information about the document being recognized. Special methods are used for documents that contain CJK characters during the analysis of the document image and also during character recognition.